Skip to content

Conversation

shreyasNaik0101
Copy link
Contributor

This PR implements site name aliases.

  • Updates data.schema.json to allow an aliases array.
  • Adds "X" as an alias for "Twitter".
  • Modifies sherlock.py to recognize aliases with the --site filter.

Closes #2373.

Copy link
Contributor

github-actions bot commented Oct 3, 2025

Automatic validation of changes

Target F+ Check F- Check
DeviantART ✔️   Pass ❌   Fail
Twitter ❌   Fail ❌   Fail
Mydramalist ❌   Fail ✔️   Pass
Apple Discussions ✔️   Pass ✔️   Pass
AllMyLinks ✔️   Pass ✔️   Pass

Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences).

Copy link
Contributor

github-actions bot commented Oct 3, 2025

Automatic validation of changes

Target F+ Check F- Check
Mydramalist ❌   Fail ✔️   Pass
AllMyLinks ✔️   Pass ✔️   Pass
Apple Discussions ✔️   Pass ✔️   Pass
DeviantART ✔️   Pass ❌   Fail
Twitter ❌   Fail ❌   Fail

Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences).

Copy link
Contributor

github-actions bot commented Oct 3, 2025

Automatic validation of changes

Target F+ Check F- Check
DeviantART ✔️   Pass ✔️   Pass
Twitter ❌   Fail ❌   Fail
Mydramalist ❌   Fail ✔️   Pass
AllMyLinks ✔️   Pass ✔️   Pass
Apple Discussions ✔️   Pass ✔️   Pass

Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences).

@obiwan04kanobi
Copy link
Contributor

obiwan04kanobi commented Oct 6, 2025

Hi @shreyasNaik0101!

I tested your PR and found the alias matching wasn't working (--site X was failing). I've fixed the implementation and tested it successfully.

What needs to be changed:

1. sherlock_project/sherlock.py - Update the site selection logic to check aliases

Lines (828 - 864)

    # Create original dictionary from SitesInformation() object.
    # Eventually, the rest of the code will be updated to use the new object
    # directly, but this will glue the two pieces together.
    site_data_all = {site.name: site.information for site in sites}
    if args.site_list == []:
        # Not desired to look at a sub-set of sites
        site_data = site_data_all
    else:
        # User desires to selectively run queries on a sub-set of the site list.
        # Make sure that the sites are supported & build up pruned site database.
        site_data = {}
        site_missing = []
        
        # Create a mapping from all site names and aliases (in lowercase) to their proper names
        site_map = {}
        for site_name, site_info in site_data_all.items():
            site_map[site_name.lower()] = site_name
            if "aliases" in site_info:
                for alias in site_info["aliases"]:
                    site_map[alias.lower()] = site_name

        for site_name_from_user in args.site_list:
            # Find the proper site name from the user's input (which could be an alias)
            proper_site_name = site_map.get(site_name_from_user.lower())
            
            if proper_site_name:
                # If a match was found, add the site's data to our list
                site_data[proper_site_name] = site_data_all[proper_site_name]
            else:
                # If no match was found for the name or any alias
                site_missing.append(f"'{site_name_from_user}'")

        if site_missing:
            print(f"Error: Desired sites not found: {', '.join(site_missing)}.")

        if not site_data:
            sys.exit(1)

2. sherlock_project/resources/data.json - Remove broken Twitter urlProbe (Nitter is dead)

Lines (2186 - 2197)

  "Twitter": {
    "errorMsg": [
      "<div class=\"error-panel\"><span>User ",
      "<title>429 Too Many Requests</title>"
    ],
    "aliases": ["X"],
    "errorType": "message",
    "regexCheck": "^[a-zA-Z0-9_]{1,15}$",
    "url": "https://x.com/{}",
    "urlMain": "https://x.com/",
    "username_claimed": "blue"
  },

You can see my working implementation here for reference:
https://github.com/obiwan04kanobi/sherlock/tree/feature/site-name-aliases

Test results after fix:

$ sherlock narendramodi --site Twitter
[+] Twitter: https://x.com/narendramodi

$ sherlock narendramodi --site X  
[+] Twitter: https://x.com/narendramodi

Both work perfectly! ✅

If you'd like, you can apply these changes to your PR. And if you use my code, feel free to add me as a co-author using:

Co-authored-by: obiwan04kanobi <[email protected]>

Let me know if you need any help!

Resolves issue #2402

Copy link
Contributor

github-actions bot commented Oct 6, 2025

Automatic validation of changes

Target F+ Check F- Check
Twitter ❌   Fail ✔️   Pass

Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences).

shreyasNaik0101 and others added 2 commits October 6, 2025 23:39
A
feat: Implement site name aliases
@shreyasNaik0101 shreyasNaik0101 force-pushed the feature/site-name-aliases branch from 9273137 to 91ba5a4 Compare October 6, 2025 18:19
Copy link
Contributor

github-actions bot commented Oct 6, 2025

Automatic validation of changes

Target F+ Check F- Check
Twitter ❌   Fail ✔️   Pass

Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences).

@shreyasNaik0101
Copy link
Contributor Author

Hello, I've fixed the JSONDecodeError and the local schema validation is now passing.

The only remaining failure is in test_validate_manifest_against_remote_schema, which is failing because the remote schema doesn't know about the new aliases property.

Could you please guide me on how to resolve this last validation error?

@obiwan04kanobi
Copy link
Contributor

Hi @shreyasNaik0101,

I analyzed all 12 test failure logs across different platforms and Python versions. Every single test shows the same pattern: 1 failed, 24 passed.

The only failure is test_validate_manifest_against_remote_schema with error:

Additional properties are not allowed ('aliases' was unexpected)

This is the expected failure The test compares our data.json (which has "aliases": ["X"]) against the production schema (which doesn't have aliases yet because our PR isn't merged).

The PR is technically sound. Once maintainers merge this, the remote schema will be updated and this test will pass. This is standard for schema-changing PRs.

@ppfeister - Could you take a look when you have a moment? The implementation is working correctly (both --site Twitter and --site X work perfectly), and the only test failure is the expected remote schema validation error for new schema properties. Would love your thoughts on whether this is ready for merge. The alias resolution logic builds a proper site_map with case-insensitive matching, and all functional tests are passing.

@ppfeister
Copy link
Member

Schema changes are expected to cause some test failures, no worries. This will still be validated independently (once there's time, ofc)

@ppfeister
Copy link
Member

Dumb issue, but how much work would it be to retain the two-space indent on the schema? Noticing the diff is the entire file, making it a pain to review with line diff (i.e. what GH uses) vs word diff

@obiwan04kanobi
Copy link
Contributor

@ppfeister Thanks for the quick response! Regarding the indent issue - that's a valid concern. The diff is showing the entire file because the indentation was changed from 2 to 4 spaces.

@shreyasNaik0101 - Would you be able to revert to 2-space indentation for the schema file? That way the diff will only show the actual changes (adding aliases and username_unclaimed properties). This will make the review much cleaner.

The actual functional changes are minimal:

  • Added "aliases": { "type": "array", "items": { "type": "string" } }
  • Added "username_unclaimed": { "type": "string" }

Everything else is just indentation changes.

@ppfeister
Copy link
Member

I haven't reviewed the full diff but worth noting, username unclaimed was deprecated quite a while ago and should not be re-added

Will give a more proper review when actually at a desk and free

# Create original dictionary from SitesInformation() object.
# Eventually, the rest of the code will be updated to use the new object
# directly, but this will glue the two pieces together.
site_data_all = {site.name: site.information for site in sites}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On lines 786-789:

@shreyasNaik0101 Small cleanup needed - this line appears to be a duplicate of line 786. Can you remove the duplicate?

site_data_all = {site.name: site.information for site in sites}	
# Create original dictionary from SitesInformation() object.		← Duplicate
# Eventually, the rest of the code will be updated to use the new object 		← Duplicate
# directly, but this will glue the two pieces together.		← Duplicate
site_data_all = {site.name: site.information for site in sites}	# ← Duplicate

One of these should be removed. Thanks!

Copy link
Contributor

github-actions bot commented Oct 6, 2025

Automatic validation of changes

Target F+ Check F- Check
Twitter ❌   Fail ✔️   Pass

Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences).

Copy link
Contributor

github-actions bot commented Oct 6, 2025

Automatic validation of changes

Target F+ Check F- Check
Twitter ❌   Fail ✔️   Pass

Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences).

@shreyasNaik0101
Copy link
Contributor Author

I’ve implemented the suggested changes. Please review and let me know if any further modifications are required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants